{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sZtfr2Gyx_qM"
      },
      "outputs": [],
      "source": [
        "# Copyright 2025 Google LLC\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"as is\" basis,\n",
        "# without warranties or conditions of any kind, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LbBTb55fx_qN"
      },
      "source": [
        "\n",
        "## **Image-Prompt Alignment**\n",
        "\n",
        "This Eval Recipe demonstrates how to use a prompt alignment autorater to compare image generation quality of two models (Imagen2 and Imagen3) using [Vertex AI Evaluation Service](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-overview)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AjNAklK4x_qN"
      },
      "source": [
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-image-prompt-alignment-eval&utm_medium=aRT-clicks&utm_campaign=image-prompt-alignment-eval&destination=image-prompt-alignment-eval&url=https%3A%2F%2Fcolab.research.google.com%2Fgithub%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fmodel_upgrades%2Fimage_prompt_alignment%2Fvertex_colab%2Fimage_prompt_alignment_eval.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-image-prompt-alignment-eval&utm_medium=aRT-clicks&utm_campaign=image-prompt-alignment-eval&destination=image-prompt-alignment-eval&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fcolab%2Fimport%2Fhttps%3A%252F%252Fraw.githubusercontent.com%252FGoogleCloudPlatform%252Fapplied-ai-engineering-samples%252Fmain%252Fgenai-on-vertex-ai%252Fgemini%252Fmodel_upgrades%252Fimage_prompt_alignment%252Fvertex_colab%252Fimage_prompt_alignment_eval.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-image-prompt-alignment-eval&utm_medium=aRT-clicks&utm_campaign=image-prompt-alignment-eval&destination=image-prompt-alignment-eval&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fworkbench%2Fdeploy-notebook%3Fdownload_url%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fmodel_upgrades%2Fimage_prompt_alignment%2Fvertex_colab%2Fimage_prompt_alignment_eval.ipynb\">\n",
        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/model_upgrades/image_prompt_alignment/vertex_colab/image_prompt_alignment_eval.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YYtBxd6y-Ju2"
      },
      "source": [
        "- Use case: Image Generation\n",
        "\n",
        "- Dataset: This eval recipe uses two JSONL dataset files that are based on the same set of prompts and map the prompts to the images generated by [Imagen 2](https://storage.googleapis.com/gemini_assets/image_prompt_alignment/dataset_imagen2.jsonl) and [Imagen 3](https://storage.googleapis.com/gemini_assets/image_prompt_alignment/dataset_imagen2.jsonl)\n",
        "\n",
        "- Metric: we use an autorater inspired by [Gecko](https://arxiv.org/abs/2404.16820) that generates questions about all visually groundable aspects of the image, answers these questions, assigns the prompt alignment score based on the answers, and generates an explanation for all identified gaps.\n",
        "\n",
        "Step 1 of 4: Configure eval settings\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ZZnEA6GZ-kMW"
      },
      "outputs": [],
      "source": [
        "%%writefile .env\n",
        "PROJECT_ID=your-project-id            # Google Cloud Project ID\n",
        "LOCATION=us-central1                  # Region for all required Google Cloud services\n",
        "EXPERIMENT_NAME=eval-image-prompt-alignment    # Creates Vertex AI Experiment to track the eval runs\n",
        "MODEL_JUDGE=gemini-2.0-flash-001  # This model will run the autorater prompt\n",
        "DATASET_URI_IMAGEN2=\"gs://gemini_assets/image_prompt_alignment/dataset_imagen2.jsonl\"  # Evaluation dataset for Imagen 2\n",
        "DATASET_URI_IMAGEN3=\"gs://gemini_assets/image_prompt_alignment/dataset_imagen3.jsonl\"  # Evaluation dataset for Imagen 3"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aqeOCM7k9t5h"
      },
      "source": [
        "Step 2 of 4: Install Python libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "id": "bR0rvHA3Lby6"
      },
      "outputs": [],
      "source": [
        "%pip install --upgrade --user --quiet google-cloud-aiplatform[evaluation] python-dotenv\n",
        "# The error \"session crashed\" is expected. Please ignore it and proceed to the next cell.\n",
        "import IPython\n",
        "IPython.Application.instance().kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9OUfM5Lz9_aM"
      },
      "source": [
        "Step 3 of 4: Authenticate to Google Cloud (requires permission to open a popup window)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VRFZFC6OLby7"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import sys\n",
        "import pandas as pd\n",
        "import vertexai\n",
        "from datetime import datetime\n",
        "from dotenv import load_dotenv\n",
        "from google import genai\n",
        "from google.cloud import storage\n",
        "from google.genai.types import Content, Part\n",
        "from vertexai.evaluation import EvalTask, CustomMetric\n",
        "\n",
        "load_dotenv(override=True)\n",
        "if os.getenv(\"PROJECT_ID\") == \"your-project-id\":\n",
        "    raise ValueError(\"Please configure your Google Cloud Project ID in the first cell.\")\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import auth\n",
        "    auth.authenticate_user()\n",
        "\n",
        "vertexai.init(project=os.getenv('PROJECT_ID'), location=os.getenv('LOCATION'))\n",
        "_gemini_client = genai.Client(vertexai=True, project=os.getenv('PROJECT_ID'), location=os.getenv('LOCATION'))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CQvekvLt9SWD"
      },
      "source": [
        "Step 4 of 4: Evaluate images from Baseline and Candidate models and print the alignment scores"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KfG9JG9VHaNw"
      },
      "outputs": [],
      "source": [
        "import json\n",
        "import os\n",
        "import pandas as pd\n",
        "import vertexai\n",
        "from datetime import datetime\n",
        "from IPython.display import clear_output\n",
        "from vertexai.evaluation import EvalTask, EvalResult, MetricPromptTemplateExamples\n",
        "\n",
        "_AUTORATER_PROMPT_TEMPLATE = '''\n",
        "You are an expert image analyst with a keen eye for detail and a deep understanding of linguistics and human perception.\n",
        "\n",
        "# Definitions\n",
        "- **Visually Groundable Requirement:** A specific claim or requirement within the image description that can be verified or refuted by examining the visual content of the image. This includes descriptions of objects (existence and attributes like color, size, shape, or text on the object), spatial relationships between objects, actions depicted, or overall scene characteristics like lighting conditions.\n",
        "- **Gap:** A visually groundable requirement that is either contradicted by the image or cannot be directly confirmed based on the image.\n",
        "\n",
        "# Instructions\n",
        "Review the image and a description of that image located in the IMAGE_DESCRIPTION tag below.\n",
        "Your goal is to rate the accuracy of the image description on the scale of 0 to 10.\n",
        "You must use the following 6-step process and provide brief written notes for each step:\n",
        "- Step 1. Identify all Visually Groundable Requirements contained in IMAGE_DESCRIPTION and save them to a numbered list.\n",
        "- Step 2. Write a numbered list of true/false questions that should be asked about each of the identified requirements in order to verify whether each requirement is satisfied by the image or not.\n",
        "- Step 3. For each of the questions created in Step 2 write a brief analysis of the most relevant information in the provided image and then write the final answer:\n",
        "    - True only if the image contains a clear positive answer to this question.\n",
        "    - False if the image clearly justifies a negative answer to this question OR does not have enough information to answer this question.\n",
        "- Step 4. Calculate the number of questions that received the answer \"True\" in step 3.\n",
        "- Step 5. Calculate the final accuracy score as the percentage of positively answered questions out of the total questions answered in Step 3, rounded to the nearest integer.\n",
        "- Step 6. Write the final answer as a Markdown codeblock containing a single JSON object with two attributes:\n",
        "    - \"score\" with the integer value of the final accuracy score calculated in Step 5.\n",
        "    - \"gaps\" with a JSON array of strings that describe each gap (question that got a negative answer in Step 3). The description should be a one sentence statement that combines key information from the question and the analysis of relevant information from Step 3.\n",
        "\n",
        "<IMAGE_DESCRIPTION>\n",
        "{image_description}\n",
        "</IMAGE_DESCRIPTION>\n",
        "'''\n",
        "\n",
        "def load_text_file(gcs_uri: str) -> str:\n",
        "    blob = storage.Blob.from_string(gcs_uri, storage.Client())\n",
        "    return blob.download_as_string().decode('utf-8')\n",
        "\n",
        "def load_image(gcs_uri: str) -> bytes:\n",
        "    blob = storage.Blob.from_string(gcs_uri, storage.Client())\n",
        "    return blob.download_as_bytes()\n",
        "\n",
        "def load_dataset(dataset_uri: str):\n",
        "    '''Convert the dataset to a Pandas DataFrame and load all images into the \"image\" column.'''\n",
        "    lines = load_text_file(dataset_uri).splitlines()\n",
        "    data = [json.loads(line) for line in lines if line.strip()]\n",
        "    df = pd.DataFrame(data)\n",
        "    df['image'] = df['image_uri'].apply(lambda image_uri: load_image(image_uri))\n",
        "    return df[['image_uri', 'prompt', 'image']]\n",
        "\n",
        "def image_prompt_alignment_autorater(record: dict) -> dict:\n",
        "    '''Custom metric function for scoring prompt alignment between the image and prompt from the given dataset record.'''\n",
        "    response = _gemini_client.models.generate_content(\n",
        "        model=os.getenv('MODEL_JUDGE'),\n",
        "        contents=[\n",
        "            Content(role='user', parts=[Part(text=_AUTORATER_PROMPT_TEMPLATE.format(image_description=record['prompt']))]),\n",
        "            Content(role='user', parts=[Part.from_bytes(data=record['image'], mime_type='image/jpeg')])\n",
        "        ]\n",
        "    )\n",
        "    json_output = json.loads(response.text.split('```json\\n')[1].split('\\n```')[0])\n",
        "    return {\n",
        "        \"image_prompt_alignment\": json_output['score'],\n",
        "        \"explanation\": '\\n'.join(json_output['gaps'])\n",
        "    }\n",
        "\n",
        "def print_scores_and_explanations(title: str, eval_result: EvalResult) -> None:\n",
        "    print(f'\\n{\"-\"*80}\\nRESULTS FOR {title}:')\n",
        "    for i, row in eval_result.metrics_table.iterrows():\n",
        "        gaps = row[\"image_prompt_alignment/explanation\"]\n",
        "        gaps = f', GAPS: {gaps}' if gaps else ''\n",
        "        print(f'{row[\"image_uri\"]}: SCORE={row[\"image_prompt_alignment/score\"]}%{gaps}')\n",
        "\n",
        "def run_eval(model: str, dataset_uri: str, experiment_name:str):\n",
        "    '''Rate the alignment between image generation prompts and the generated images and identify gaps using a custom autorater.'''\n",
        "    timestamp = f\"{datetime.now().strftime('%b-%d-%H-%M-%S')}\".lower()\n",
        "    dataset=load_dataset(dataset_uri)\n",
        "    task = EvalTask(\n",
        "        dataset=dataset,\n",
        "        metrics=[CustomMetric(name=\"image_prompt_alignment\", metric_function=image_prompt_alignment_autorater)],\n",
        "        experiment=experiment_name\n",
        "    )\n",
        "    return task.evaluate(experiment_run_name=f\"{timestamp}-{model.lower().replace('.', '-')}\")\n",
        "\n",
        "def compare_models(project_id: str, location: str, experiment_name: str, model_a: str, dataset_uri_a: str, model_b: str, dataset_uri_b: str) -> None:\n",
        "    global _gemini_client\n",
        "    _gemini_client = genai.Client(vertexai=True, project=project_id, location=location)\n",
        "    vertexai.init(project=project_id, location=location)\n",
        "    results_a = run_eval(model_a, dataset_uri_a, experiment_name)\n",
        "    results_b = run_eval(model_b, dataset_uri_b, experiment_name)\n",
        "    clear_output()\n",
        "    print_scores_and_explanations(model_a, results_a)\n",
        "    print_scores_and_explanations(model_b, results_b)\n",
        "    print(f\"\\n{model_a} average alignment score = {results_a.summary_metrics['image_prompt_alignment/mean']:.1f}%\")\n",
        "    print(f\"{model_b} average alignment score = {results_b.summary_metrics['image_prompt_alignment/mean']:.1f}%\")\n",
        "\n",
        "compare_models(\n",
        "    project_id=os.getenv('PROJECT_ID'),\n",
        "    location=os.getenv('LOCATION'),\n",
        "    experiment_name=os.getenv('EXPERIMENT_NAME'),\n",
        "    model_a=\"IMAGEN2\",\n",
        "    dataset_uri_a=os.getenv('DATASET_URI_IMAGEN2'),\n",
        "    model_b=\"IMAGEN3\",\n",
        "    dataset_uri_b=os.getenv('DATASET_URI_IMAGEN3')\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EhZZ030LJGL3"
      },
      "source": [
        "[Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval) about Vertex AI GenAI Evaluation Service."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "3.10.12",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}